Goto

Collaborating Authors

 assumption 9


Agents Robust to Distribution Shifts Learn Causal World Models Even Under Mediation

Neural Information Processing Systems

In this work, we prove that agents capable of adapting to distribution shifts must have learned the causal model of their environment even in the presence of mediation. This term describes situations where an agent's actions affect its environment, a dynamic common to most real-world settings. For example, a robot in an industrial plant might interact with tools, move through space, and transform products to complete its task. We introduce an algorithm for eliciting causal knowledge from robust agents using optimal policy oracles, with the flexibility to incorporate prior causal knowledge. We further demonstrate its effectiveness in mediated singleagent scenarios and multi-agent environments. We identify conditions under which the presence of a single robust agent is sufficient to recover the full causal model and derive optimal policies for other agents in the same environment. Finally, we show how to apply these results to sequential decision-making tasks modeled as Partially Observable Markov Decision Processes (POMDPs).


Differential Privacy without Sensitivity

Neural Information Processing Systems

The exponential mechanism is a general method to construct a randomized estimator that satisfies (ฮต,0)-differential privacy. Recently, Wang et al. showed that the Gibbs posterior, which is a data-dependent probability distribution that contains the Bayesian posterior, is essentially equivalent to the exponential mechanism under certain boundedness conditions on the loss function. While the exponential mechanism provides a way to build an (ฮต,0)-differential private algorithm, it requires boundedness of the loss function, which is quite stringent for some learning problems. In this paper, we focus on (ฮต,ฮด)-differential privacy of Gibbs posteriors with convex and Lipschitz loss functions. Our result extends the classical exponential mechanism, allowing the loss functions to have an unbounded sensitivity.




A Guide Through the Zoo of Biased SGD

Neural Information Processing Systems

We also provide examples where biased estimators outperform their unbiased counterparts or where unbiased versions are simply not available. Finally, we demonstrate the effectiveness of our framework through experimental results that validate our theoretical findings.


A Baseline algorithms

Neural Information Processing Systems

The following theorem is a more general version of Theorem 5.1. Assume that Assumptions 1 to 3 hold. Note that the only difference between Theorem B.1 and Theorem 5.1 lies in That is, the "oldest" response used to update By Jensen's inequality and L -smoothness, we have null f In order for the paper to be self-contained, we restate the proof here. The following lemma is slightly modified from Lemma 8 in [18]. By Lemma B.1, we have B Combining Appendix B.3.1 and Appendix B.3.2, we have B.4 Deriving the convergence bound In this subsection, we obtain Theorem B.1 based on the descent lemma.





BC-ADMM: An Efficient Non-convex Constrained Optimizer with Robotic Applications

arXiv.org Artificial Intelligence

Non-convex constrained optimizations are ubiquitous in robotic applications such as multi-agent navigation, UAV trajectory optimization, and soft robot simulation. For this problem class, conventional optimizers suffer from small step sizes and slow convergence. We propose BC-ADMM, a variant of Alternating Direction Method of Multiplier (ADMM), that can solve a class of non-convex constrained optimizations with biconvex constraint relaxation. Our algorithm allows larger step sizes by breaking the problem into small-scale sub-problems that can be easily solved in parallel. We show that our method has both theoretical convergence speed guarantees and practical convergence guarantees in the asymptotic sense. Through numerical experiments in a row of four robotic applications, we show that BC-ADMM has faster convergence than conventional gradient descent and Newton's method in terms of wall clock time.